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ESTIMATION  OF  LATENT  GROUP  EFFECTS 


Abstract 


'Conventional  methods  of  multivariate  normal  analysis  do  not  apply  when 
the  variables  of  interest  are  not  observed  directly,  but  must  be  inferred  from 
fallible  or  incomplete  data.  For  example,  responses  to  mental  test  items  may 
depend  upon  latent  aptitude  variables,  which  modeled  in  turn  as  functions  of 
demographic  effects  in  the  population.  A  method  of  estimating  such  effects  by 
means  of  marginal  maximum  likelihood,  implemented  by  means  of  an  EM  algorithm, 
is  proposed.  Asymptotic  standard  errors,  likelihood  ratio  tests  of  alter¬ 
native  models,  and  computing  approximations  are  provided.  The  procedures  are 
illustrated  with  data  for  tests  from  the  Armed  Services  Vocational  Aptitude 
Battery  administered  to  a  national  probability  sample  of  American  youth. 
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ESTIMATION  OF  LATENT  GROUP  EFFECTS 
1 .  INTRODUCTION 

Consider  a  number  of  multivariate  normal  populations  in  the  random 
variable  0,  with  a  common  dispersion  matrix  and  with  means  qiven  by  linear 
functions  of  the  fixed  group-effect  parameters  £.  Consumer  attitudes  in  the 
cells  of  a  multi-way  demographic  design,  for  example,  might  be  modeled  in 
terms  of  only  main  effects  and  selected  interactions.  Maximum  likelihood  (ML) 
estimation  of  £  from  samples  of  0  from  each  population  is  well  known,  if 
it  can  be  assumed  that  0  values  are  measured  either  without  error  or  with 

<v 

iid  normal  and  unbiased  error  components  (Anderson,  1958). 

Less  familiar,  however,  are  procedures  to  be  followed  when  these  as¬ 
sumptions  are  not  tenable.  If  observations  are  of  counts  of  favorable 
responses  on  an  opinion  survey,  for  example,  the  conditional  distribution  of 
observed  score  cannot  be  independent  of  expected  score  under  any  model  with 
unbiased  measurement  errors  (Lord  and  Novick,  1968:509).  Or,  as  a  second 
example,  observed  data  may  consist  of  subjects'  responses  to  test  items  which 
depend  stochastically  on  latent  aptitude  parameters  through  a  quantal  response 
model.  More  generally,  we  wish  to  consider  situations  in  which  it  is  not 
values  of  0  that  are  observed,  but  values  of  a  secondary  random  variable  x 
whose  distributions  depend  on  0  through  known  density  functions  p(x|0). 

This  paper,  then,  presents  a  marginal  maximum  likelihood  (MML)  solu¬ 
tion  for  £  from  x,  along  the  lines  employed  by  Bock  and  Aitkin  (1981)  to 
estimate  parameters  in  item  response  models.  The  results  extend  those  of 
Andersen  and  Madsen  (1977)  and  Sanathanan  and  Blumenthal  (1978),  who  estimate 
the  mean  and  variance  of  a  univariate  normal  latent  distribution  when  p(x|0) 
is  the  one-parameter  logistic  (Rasch)  item  response  model,  and  of  Andersen 
(1980),  who  tests  the  equality  of  latent  means  and  variances  in  the  same 


context 
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We  begin  in  Section  2  with  a  brief  review  of  ML  estimation  of  T 

/v 

and  Z,  the  common  dispersion  matrix,  when  values  of  0  are  observed,  or,  in 
the  terminology  of  Dempster,  Laird,  and  Rubin  (1977),  the  "complete  data" 
problem.  Section  3  considers  the  case  in  which  values  of  x  are  observed 
instead,  or  the  "incomplete  data"  problem.  The  resulting  likelihood  equations 
can  be  solved  by  means  of  cycles  of  an  EM  algorithm,  which,  since  the  unknown 
population  density  belongs  to  the  exponential  family,  is  guaranteed  to  con¬ 
verge  to  a  local  maximum.  Computing  approximations  are  presented  in  Section 
4,  asymptotic  standard  errors  in  Section  5,  and  likelihood  ratio  tests  of  fit 
in  Section  6.  Section  7  illustrates  the  procedures  with  data  from  the  Profile 
of  American  Youth  survey  (U.S.  Department  of  Defense,  1982). 

2.  THE  "COMPLETE  DATA"  SOLUTION 


We  assume  K  homoscedastic  p-variate  normal  distributions  in  the 
random  variable  0,  with  common  dispersion  matrix  Z  and  means  uiven  as 

linear  functions  of  M  fixed  group-effect  parameters  Y  ?  that  is. 


or,  more  compactly, 

m  -  t  r  , 

KXp  KXM  MXp 

where  T  is  a  known  basis  matrix,  the  k'th  row  of  which  specifies  the 

dependence  of  on  the  parameters  £. 

Suppose  that  samples  of  0  of  size  have  been  obtained  from  the 

K  populations.  Let  N  ■  £  and  let  be  indicators  that  take  the 

k 

value  1  when  observation  i  is  associated  with  population  k  and  0  when  it 
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is  not.  The  likelihood  of  the  sample  is  then  given  as 

I.. 

l  =  n  n[g  (e  )]  , 

i  k 

where 


q  (e.)  =,  -I — —  exp[-  V2  (0-  -  r^t,r  e”1 ( 0 .  -  r>)]  . 

k  ~1  .p/2  L  ~i  ~  ~k  ~  ~i  ~  ~k  J 

(  2JT) 


For  reference  in  a  following  section,  we  digress  briefly  to  demon¬ 
strate  that  with  population  membership  known,  this  density  belongs  to  the 
exponential  family.  Considering  its  parameters  to  be  £  and  E-1  for 
convenience,  we  must  show  that  it  can  be  written  in  the  form 


f ( 0 )  =  exp{ £  A  ( T ,  Z-1 ) B_(0)  +  C( 8 )  +  D<r,  e-1 ) }  , 


where  the  summation  runs  over  the  unique  elements  of  T  and  E~  .  Letting 
(auv)  represent  I-1,  this  can  be  done  by  taking 

I  uv 

la  if  u  =  v 

Auv<I'  r1)  -  1 

a  l _  uv  . _  . 

|2a  if  u  *  v 


B  (0)  =  0  0 

UV  ~  U  V 

a 


and  for  each  element  y  of  V,  taking 

su 


A  (T,  E )  m  l  l  oUVY  y 

Y  ~  ~  L  L  mv'su 

su  v  m 
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and 


Finally 


and 


where 


and  E , 


or 


then 


where 


B  (0) 
Tsu 


l  l  ^^s^m  • 

k  m 


C(  0 )  *  0 


D (£,  e"1)  =  log-1  [  |  E |  1/2(2Tt)p/2]  . 


Continuing  to  the  main  argument,  we  obtain  the  log  likelihood  as 


log  l  =  l  l  iikiog  g^e*) 
i  k 

=  C  -  N/2  log|E|-V2I  l  Iik<e.-  rtk)' 

i  k 


rsk> 


(2.1 ) 


does  not  depend  on  E  or  I*. 

ML  estimation  proceeds  by  differentiating  (2.1)  with  respect  to  £ 
then  equating  the  results  to  zero  to  obtain  the  likelihood  equations 


3  log  L 

5T - 

/"w 


l  l  I 


ik 


0 


1 1  hAH  -  ?  I  htl'iA 

i  k  l  k 


l 

k 


(2.2) 


4 


■C' 


l  I..  0. 

“  ik~i 


(2.3) 


Rewriting  (2.2)  more  compactly,  we  obtain  the  likelihood  equation  f  as 


where 


T"DM  =  T"DTr  , 


D  =  diag(N  ,  .  .  N  )  . 

^  1  K 


Assuming  T  to  be  of  full  column  rank  M  (a  condition  which  if  not  satisfied 
initially  can  always  be  met  by  reparameterizing  in  terms  of  contrast  among  the 
original  X'3^'  we  obtain 


A  .A 

r  =  (T^DT)  T^DM  . 


(2.4) 


Likelihood  equations  for  Z  are  similarly  obtained: 


=  N/2<2  E-1  -  diag  £_1) 


+  1/3 1  l  i.,  {2  if1  o.  -  rt„)(e.  -  r't . r  z' 

/2  r  ,  lk 1  ~  ~i  ~  ~k  ~i  ~  ~k  ~ 
1  k 


-  diag [z”1 ( 0.  -  rt,)(0.  -  T'tJ'  Z-1]} 

~i  ~  ~k  ~i  ~  ~k  ~  1 1 


equating  to  zero  and  simplifying  yields 


diag  £-21=  diag  S  -  2S 

a/  a.  /v 


(2.5) 


where 


S  -  N-'  I  I  Wli  -  r-JkKSi  -  TV 

i  k 


(2.6) 


After  replacing  £  by  £,  we  see  from  the  form  of  (2.5)  that 


Z  =  S  . 


(2.7) 


It  is  well  known  that  £  and  £  are  the  unique  zeros  of  the  likelihood 


equations,  and  that  they  maximize  the  log  likelihood  function  (2.1). 
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Note  that  (2.4)  and  (2.7)  imply  that  M  and  S  are  jointly  sufficient 
statistics  for  £  and  £.  In  anticipation  of  the  incomplete  data  problem,  it 
is  instructive  to  recognize  their  computation  in  (2.3)  and  (2.6)  as  standard 
formulas  for  means  and  dispersions,  Stieltjes  integrals  over  not  the  unknown 
true  density  but  over  an  approximation  of  it,  namely,  the  discrete 
distribution  given  by  a  finite  sample  of  points  from  the  distribution  of 
interest. 


3.  THE  "INCOMPLETE  DATA"  SOLUTION 


Suppose  that  rather  than  values  of  0,  we  observe  values  of  x  which 
depend  on  9  through  p(x|0),  densities  of  known  form  which  may  vary  from 
one  observation  to  the  next.  For  example,  x  you  may  be  a  vector  of  discrete 
values  depending  on  the  continuous  latent  variable  9  through  a  quantal 
response  model;  or,  as  a  second  example,  x  may  be  equal  to  0  plus  a  random 
error  component,  the  distributions  of  which  are  known  but  need  not  be  either 
iid  nor  normally  distributed.  Under  these  assumptions,  the  marginal  likeli¬ 
hood  of  response  x^  obtained  from  population  k  is  given  as 


h(x  I r,  £)  =  n[/  p(x  |9)g  (9|£,  £)de] 
k  9 


ik 


(3.1  ) 


For  notational  convenience,  we  write  simply  h(x^)  and  gk(8)  hereafter,  the 
dependence  on  £  and  £  implicit. 

From  (3,1),  the  log  marginal  likelihood  of  samples  of  x  of  size 
is  given  by 


log  L *  =  I  log  h(x. ) 

i  x 


l  l  xi)clo<?  /  P(£i  I • 

i  k  0 


(3.2) 
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The  derivative  of  log  L*  with  respect  to  T  is  then  obtained  as 


3  log  L* 
3f 


3  log  h(x. ) 

-? - 

i  ~ 

i  3gk(0) 

-  I  l  hkh~  (ii}  /  P(x,i  |  0) — 
i  k  0  ~ 


-  I  l  i>  /  P(5ile)9k(e>  r1** 

i  k  0 


r't.  )<de 

~  ~k  ~k 


(3.3) 


Equating  to  zero  yields 


l  l  W1  'Si’  /  p(JSili)gk(~)~d2ik 

i  k  0 


■  1 1  h>y  'si1 1 

x  k  0 


l  « •  l  wwa. 

k  k 


(3.4) 


where 


£  -  /  ®  pk<0l<5>>d£ 

0 


(3.5) 


pk(«lx>>  -  •£’  I  ilkp<»lsi> 


\1  £  Iikh_1 (~i)p(~il~)gk(~)  (3,6) 


being  the  posterior  density  of  0  in  population  k  given  £,  £,  and  the 

observed  data  (X)  via  Bayes  theorem.  Rewriting  (3.4)  more  compactly, 

A 

T'DM*  =  T'DTr  , 
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from  which 


T  =  (T'DT)“1T'DM*  . 


(3.7) 


Similarly,  differentiating  (3.2)  with  respect  to  E  yields 


,  .  3  log  h(x. ) 

3  log  L*  ^  ~i 


3  Z 


3  E 


=  I  I  Iik{-1/2<2  E_1  -  diag  E~')  +  1/2h~'(*i)  /  p(2j.  I  ®  ( 

i  k  9 

x  {2  E-1 ( 0  -  £'tk)(0  -  £ ' t^ ) "  e"1 
-  diag[E_1(0  -  £^5(0  -  £%)'  E “ 1 } ]d©}  . 


.-1 . 


-1 


(3.8) 


Equating  to  zero  then  simplifying  leads  to 


diag  E  -  2  E  =  diag  S*  -  2S*  , 


where 


S*  -  If  <9 

k  0 


T't,  )(0  -  rt.  )'p.  (eix, )>d0  •  (3.9) 

~  ~k  ~  ~k  k  ~  ~i  ~ 


Again  it  is  clear  that 


E  =  S* 


Like  (2.3)  and  (2.6),  (3.5)  and  (3.9)  are  standard  formulas  for 
computing  means  and  dispersions  from  an  approximation  of  an  unknown  density. 
Now  the  approximation  is  not  based  on  a  discrete  set  of  sample  points  from  the 
distribution  but  on  an  average  over  observations  of  the  posterior  density  of 
9  given  each  observation.  These  posterior  densities,  however,  are  computed 
via  Bayes  theorem  in  (3.6)  with  the  true  densities  assumed  known.  Thus,  the 
likelihood  equations  (3.7)  and  (3.10)  constitute  a  system  of  implicit 
equations  in  £  and  E,  since  they  are  defined  in  terms 
of  M*  and  S*  which  depend  in  turn  on  £  and  E  through  h  and  gk. 


One  approach  to  solving  (3.7)  and  (3.10),  thereby  obtaining  zeros  of 
the  log  likelihood,  is  the  so-called  method  of  successive  approximations. 

That  is,  M*  and  S*  are  computed  through  (3.5)  and  (3.9)  with  provisional 
estimates  £fc  and  Efc;  improved  estimates  rt+1  and  Zt+1  are  then 
obtained  by  evaluating  (3.7)  and  (3.10)  with  respect  to  these  new  values. 

This  procedure  will  be  recognized  as  an  application  of  the  EM  algorithm,  as 
described  by  Dempster,  Laird,  and  Rubin  (1977),  who  demonstrated  convergence 
to  a  maximum  of  the  likelihood  function  when  the  complete  data  density  is  a 
member  of  the  exponential  family,  as  it  is  in  the  problem  at  hand.  The 
notoriously  slow  convergence  of  the  EM  algorithm,  which  worsens  as  the  den¬ 
sities  p(x|0)  become  more  diffuse,  can  be  largely  ameliorated  by  the  use  of 
acceleration  techniques  such  as  those  described  by  Ramsey  (1975). 

4.  COMPUTING  APPROXIMATIONS 

Because  closed-form  expressions  for  the  integrals  in  (3.1),  (3.5),  and 
(3.9)  are  not  generally  available,  numerical  approximations  are  required  in 
applying  the  foregoing  solution.  Three  approaches  are  outlined  in  this 
section:  Gauss-Hermite  quadrature,  quadrature  over  fixed  points,  and  Monte 
Carlo  integration. 

For  accuracy  and  stability,  Gauss-Hermite  quadrature  is  the  preferred 
method  of  numerical  integration  over  the  normal  distribution  when  p  is 
small.  Stroud  and  Sechrest  (1966)  provide  tables  of  optimal  points  and 
weights  for  the  univariate  standard  normal  density,  which  will  be  denoted  (Zq) 
and  (W(Zg)),  for  q  =  1,  ...,  Q.  A  grid  of  points  for  the  p-variate 
standard  normal  is  obtained  as  the  Cartesian  product  of  p  univariate  sets  of 
points,  with  weights  equal  to  the  products  of  weights  associated  with  each 
element  in  the  vector  defining  a  grid  point.  That  is,  a  typical  point  in  the 
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grid^ill  take  the  form 


Z  a(Z.*»*/Z  ) 

~q  qi  qp 


and  have  an  associated  weight  of 


w(z  )  =  n  w(z  >  . 

t=i  qt 

The  integrations  in  (3.1),  (3.5),  and  (3.9)  take  place  over  a  general 
p-variate  normal  distribution,  necessitating  a  change  of  variables  of  integra¬ 
tion  in  order  for  Gauss-Hermite  quadrature  to  be  employed.  We  illustrate  with 
(3.1).  Let  z  =  (8  -  r't  )V,  where  W'  =  I  is  the  Cholesky  factorization 
of  E  (implying  that  |vj  =  jE|  ^).  Then 


h(x  )  =  n  {/  p(x  |e) - -  exp [-  V2  (9  -  £  1  -  £'ik>]dS} 

k  8  (2ir)p/ 


-  n  if  p^il£k(5k,)71-^72  exp<-^kV2)lHld£k} 


k  sk 


where 


where 


IT  {/  p(x.|6k(zk))(2ir)"  P/2  exp(-zkzk/2)dzk} 


k 


8,  (z)  =  Vz  +  r't.  . 


h(x.)  -  n  {  l 


X  ,  -  8^(Z  )  and  W(X  .  )  =  W(Z  )  . 

~qk  ~k  ~q  ~qk  ~q 


Computing  approximations  of  and  S*  are  obtained  similarly  as 


k  ■  l  T  V^i'V'V  ■  1  Vp?k 
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where 

P*  =  n"1  T  I.,  h"1 (X. )p(x. lx  .  )W(x  .  ) 
qk  K  r  lk  ~i  r  ~i'~qk  ~qk 

and 


S* 


l  l 

k  q 


(X  . 
-qk 


rt,)(x  . 

~  Me  ~qk 


-  V 


V 


P* 

q* 


(4.2) 


The  similarity  of  the  computing  approximations  (4.1)  and  (4.2)  to  the 
complete  data  solution  given  as  (2.3)  and  (2.6)  are  immediately  apparent; 
means  and  dispersions  are  again  computed  with  respect  to  a  discrete  approxima¬ 
tion  of  the  distribution  of  interest.  This  time,  however,  the  discrete 
approximation  is  based  not  on  sample  points  from  that  distribution  but  on 
posterior  estimates  of  its  density  at  selected  quadrature  points,  given  the 
observed  data  (x).  In  contrast  to  (2.3)  and  (2.6),  (4.1)  and  (4.2)  constitute 
a  system  of  implicit  equations  because  of  the  dependence  of  the  weights  p*^ 
on  the  unknown  parameters  of  the  distribution. 

An  alternative  approximation  that  can  offer  considerable  computational 
advantage  is  quadrature  over  a  fixed  grid  of  points.  Whereas  Gauss-Hermite 
quadrature  computes  points  anew  each  cycle  in  accordance  with  provisional 
estimates  of  £  and  E,  it  is  possible  to  retain  the  same  grid  of  points  for 
all  cycles  and  thereby  avoid  computing  pfxlx^)  every  cycle.  A  grid  of 
points  X  is  selected  a  priori  to  span  a  region  where  the  preponderance  of 

~q 

the  population  distributions  is  believed  to  lie.  New  weights  are  computed  in 
each  cycle  from  provisional  estimates  of  £  and  E  as  follows: 

wk<v  -  «p[-Vi<2q  -  rv'  r^Sq  -  rvi  • 

The  computing  aproximations  (4.1)  and  (4.2)  remain  unchanged  except  for  the 
substutitions  of  for  and  wfc(Xq)  for  W(Xqk) .  When  the  grid  is 

well  chosen,  estimates  of  E  and  f  will  agree  well  with  those  computed  via 
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Gauss-Hermite  quadrature.  When  the  points  are  poorly  chosen,  however,  loss  of 

accuracy  and/or  stability  can  result. 

A  second  alternative  that  can  prove  useful  when  larqe  p  renders 

quadrature  over  a  qrid  cumbersome  is  Monte  Carlo  inteqration.  In  each 

cycle,  Q  random  points  X  ,  are  generated  for  each  population  k  in 

~qk 

accordance  with  provisional  estimates  of  £  and  Z.  The  computing  formulas 
(4.1  )  and  (4.2)  remain  unchanged  except  that 

W(X  ,  )  =  1/0  k  »  1,  .  .  .,  K  . 

~qk 

5.  ASYMPTOTIC  STANDARD  ERRORS 


Following  Bock  and  Lieberman  (1970),  we  may  approximate  the  inverse  of 

A  A 

the  asymptotic  covariance  matrix  of  the  estimators  £  and  Z  by 


» -  n- 

i 


3  log  h(x, ) 


3  loq  h(x  ) 
~1 


-)[- 
k 


:)  . 


where  £  represents  the  Mp  elements  of  £  and  the  p(p+l)/2  nonredundant 
elements  of  Z  written  as  a  single  vector.  Large  sample  standard  errors  are 
obtained  as  the  square  roots  of  the  diagonal  elements  of  H  1 .  Expressions 
necessary  for  the  evaluation  of  H  are  found  in  (3,3)  and  (3.8).  Using  the 
univariate  case  as  an  illustration,  the  required  gradient  vectors,  gramian 
products  of  which  are  summed  over  observations  to  produce  H,  are  shown  below 


3  loq  h(x, ) 

~i  v 

3y  =  t 


Zikh 


1  l®)« 

1  9  l 


-2 


(9 


*  O-2  l  I . .  h”1  (x.  )  y  p(x.  |x  U)W(X  ,  )  (X  .  -  r't.  )t, 

"  Ik  ai  "  ~l  1  qk  qk  qk  ~  -^k  Km 

k  k 

-  »'2  l  l 

k  q 
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3  log  h(x. ) 


■  73  * ~l  J  I  -  £'<*><»  -  CV19 

2a  2a  k  9 


JT  +  74  |  'ik  l  P<Xqkl!iHXqk  -  £'V<V  -  rv'  • 

2a  2a  k  q 


6.  TESTS  OF  FIT 


Consider  two  competing  models  for  a  given  data  set,  with  Model  1 
nested  within  Model  2.  In  large  samples,  the  fit  of  the  two  models  can  be 
compared  by  means  of  the  statistic 


X  =  -  2  log(L*/L* ) 


which,  when  Model  1  is  correct,  follows  a  chi-square  distribution  with  degrees 
of  freedom  equal  to  the  number  of  additional  parameters  in  Model  2. 

When  the  number  of  potential  responses  x  is  small  compared  to  the 
sample  size,  it  is  possible  to  compare  the  fit  of  a  given  model  to  a  general 
multinominal  alternative.  First  the  universe  of  potential  responses  x^  is 
partitioned  into  mutually  exclusive  and  exhaustive  classes  such  that  the 
potential  responses  of  a  given  observation  constitute  exactly  one  class.  If  a 
test  with  several  parallel  forms  is  administered,  for  example,  each  class  of 
responses  will  consist  of  all  possible  response  vectors  to  the  items  in  a 
given  test  form.  Let  r (x^)  be  the  count  of  response  x^  observed  in 
population  k,  and  let  N(x^)  be  the  total  number  of  responses  from  the 
same  class  as  x^  that  are  observed  in  population  k.  Then  the  statistic 


-2  l  l  r(xjlk)log[N(xtk)h(xJlk)/r(xtk)J 


(6.2) 
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[with  terms  for  which  r(x„.  )  =  0  set  to  zero]  will  follow  a  chi-square 
distribution  in  large  samples  when  the  model  is  correct,  with  degrees  of 
freedom  equal  to  the  numl  er  of  non-zero  r(x^)  terms  minus  the  number  of 
parameters  estimated  in  the  model  minus  K.  It  will  be  noted  that  the 
difference  between  the  values  of  (6.2)  for  two  nested  models  takes  the  same 
value  and  has  the  same  degrees  of  freedom  as  a  direct  comparison  via  (6.1). 

Following  Andersen  (1980),  we  may  test  the  equality  of  dispersion 
matrices  across  groups  in  a  two-step  procedure.  First,  means  and  dispersion 
matrices  are  estimated  in  all  groups  separately.  The  product  of  the 
likelihoods  resulting  from  these  separate  analyses  is  accumulated.  Second, 
separate  means  and  a  common  dispersion  matrix  are  estimated  by  employing  an 
identity  matrix  as  the  basis  matrix  T  and  proceding  as  described  in  Section 
3.  The  resulting  likelihood  may  be  compared  with  the  first  via  (6.2)  to 
obtain  a  large-sample  chi-square  test  of  the  equality  of  dispersion  matrices 
over  groups,  with  the  number  of  degrees  of  freedom  equal  to  (K  -  l)p(p  +  1)/2. 

7.  A  NUMERICAL  EXAMPLE 

Item  response  models  in  psychometrics  express  the  probability  of  a 
given  response  to  a  test  item  as  a  function  of  a  subject's  latent  ability 
parameter  9  and  one  or  more  parameters  that  characterize  the  regression  of 
the  item  response  on  ability.  The  three-parameter  logistic  item  response 
model  (Birnbaum,  1968)  for  dichotomous  items,  for  example,  gives  the 
probability  of  a  correct  response  to  item  j  from  subject  i  as 

P(xtj  “  1  I  9^  »  a^  ,  bj  ,  c^  ) 

"  Pij 

exp[ 1 ,7a . (6.  -  b . ) ] 

=*  cj  +  (i  -  Cj)^  exp[1.7a^9i  n  ' 


(7.1a) 


and  the  probability  of  an  incorrect  response  as 


P(xi;.  =  0 1  0i ,  a  j ,  b..,  c  ^ )  =  1  -  P  ,  (7.1b) 

where  x^  denotes  the  response,  1  if  correct  and  0  if  not,  and  where  9^  is 
the  ability  of  subject  i  and  a^ ,  bj ,  and  Cj  are  parameters  that 
characterize  item  j:  a j ,  the  slope  parameter,  reflects  the  reliability  of 
the  item;  b  ^ ,  the  threshold,  reflects  its  difficulty;  and  c j ,  the  lower 
asymptote,  reflects  the  minimal  probability  of  a  correct  response  from  even 
subjects  with  extremely  low  abilities.  Under  the  usual  assumption  of  local  or 
conditional  independence,  the  probability  of  a  pattern  of  responses  from 
subject  i  to  a  number  of  items  is  given  by  the  product  over  items  of 
expression  like  (7.1); 

P(x.|0)  =  nP(x..|0.,  a,,  b.,  c.)  .  (7.2) 

j  13  i  3  3  3 

In  most  applications,  item  response  models  are  used  to  estimate  the  latent 
abilities  of  individuals.  With  values  of  item  parameters  assumed  known 
(generally  estimated  from  a  large  sample  of  subjects),  one  may  obtain  a 
maximum  likelihood  estimate  of  9  with  respect  to  a  given  response  vector  by 
maximizing  (7.2)  as  a  function  of  9,  and  a  large-sample  standard  error  by 
taking  the  negative  reciprocal  of  the  second  derivative  of  the  natural 
logarithm  of  (7.2)  evaluated  at  the  mle  9. 

There  are  several  reasons  not  to  approximate  the  distribution  of  0 

A 

in  a  population  from  the  distribution  of  9,  or  to  carry  out  ANOVA  procedures 

A 

on  values  of  9  to  estimate  group  effects  on  means  of  9  in  various  subpop¬ 
ulations.  First,  values  of  9  are  estimated  with  varying  precision,  thereby 
violating  the  assumptions  upon  which  standard  ANOVA  procedures  are  based. 
Second,  estimation  of  9  from  certain  response  patterns  is  problematic; 
patterns  with  all  correct  or  all  incorrect  responses,  along  with  most  patterns 
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with  total  scores  below  chance  level  (the  sum  of  the  °j's  over  the  items  a 
subject  has  been  presented)  yield  infinite  mle's.  Deleting  the  data  of  sub¬ 
jects  with  such  patterns  biases  estimates  of  the  population  means  and  vari¬ 
ances,  while  assigning  them  finite  values  either  arbitrarily  or  by  incor¬ 
porating  prior  information  introduces  biases  into  the  estimation  of  the 
0*s  themselves.  Third,  stable  estimation  of  individuals'  9's  requires  at 
least  15  or  20  responses  per  subject,  thereby  proscribing  the  use  of  more 
efficient  sampling  designs  that  would  be  preferred  when  only  population -level 
parameters  are  of  interest.  The  methods  introduced  in  the  preceding  sections 
suffer  none  of  these  deficiencies. 

As  an  example,  we  consider  data  from  the  Profile  of  American  youth,  a 
survey  of  the  aptitudes  of  a  sample  of  the  population  of  Americans  aged  16 
through  23  in  July  1980  (U.S.  Department  of  Defense,  1982).  Table  1  presents 
counts  of  the  sixteen  possible  response  patterns  observed  to  four  items  from 
the  Arithmetic  Reasoning  test  of  the  Armed  Services  Vocational  Aptitude 
Battery  ( ASVAB ) ,  Form  8A,  as  observed  in  samples  of  white  males  and  females 
and  black  males  and  females.  The  parameters  of  these  items  under  the  three- 
parameter  logistic  item  response  model,  shown  in  Table  2,  were  estimated  from 
a  sample  of  1,178  cases  from  the  11,787  available  using  the  BILOG  computer 
program  (Mis levy  and  Bock,  1982). 

Tables  3  and  4  presents  the  results  of  fitting  a  series  of  nested 
models  to  the  data  of  Table  1 .  Examination  of  the  differences  between 
likelihood  ratio  chi-squares  against  the  general  multinominal  alternative 
suggests,  to  begin  with,  that  within-group  variation  may  not  be  homogeneous. 
Continuing  the  example  for  purposes  of  illustration,  we  find  strong  evidence 
for  a  race  effect  and,  to  a  lesser  extent,  for  3ex  and  interaction  effects. 
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TABLE  1 

COUNTS  OP  OBSERVED  RESPONSE  PATTERNS 


ITEM 

RESPONSE 

12  3  4 

WHITE 

MALES 

WHITE 

FEMALES 

BLACK 

MALES 

BLACK 

FEMALES 

0 

0 

0 

0 

23 

20 

27 

29 

0 

0 

0 

1 

5 

8 

5 

8 

0 

0 

1 

0 

12 

14 

15 

7 

0 

0 

1 

1 

2 

2 

3 

3 

0 

1 

0 

0 

16 

20 

16 

14 

0 

1 

0 

1 

3 

5 

5 

5 

0 

1 

1 

0 

6 

11 

4 

6 

0 

1 

1 

1 

1 

7 

3 

0 

1 

0 

0 

0 

22 

23 

15 

14 

1 

0 

0 

1 

6 

7 

10 

10 

1 

0 

1 

1 

19 

6 

1 

2 

1 

1 

0 

0 

21 

18 

7 

19 

1 

1 

0 

1 

1  1 

15 

9 

5 

1 

1 

1 

0 

23 

20 

10 

8 

1 

1 

1 

1 

86 

42 

2 

4 

TOTAL 

264 

227 

141 

147 

TABLE  2 

ITEM  PARAMETERS 


ITEM 

a 

b 

c 

1 

1.27 

-.13 

.22 

2 

1  .45 

.42 

.34 

3 

2.49 

.71 

.31 

4 

2.27 

.62 

.20 

TABLE  3 


PARAMETER  ESTIMATES  AND  FIT  STATISTICS 


EFFECTS 

IN  MODEL 

GRAND 

MEAN 

RACE 

SEX 

INTERACTION 

VARIANCE 

CHI- 

SQUARE 

GRAND  MEAN 

.02  (.05) 

— 

— 

— 

.85 

(.12) 

223.77 

GRAND  MEAN, 

SEX 

.02  (.05) 

— 

.29 

(  .09) 

— 

.83 

(  .12) 

213.31 

GRAND  MEAN, 

RACE 

-.11  (.06) 

.92 

(.11) 

— 

— 

.66 

(.11) 

1 24.1 4 

GRAND  MEAN, 

RACE,  SEX 

-.11  (.06) 

.91 

(.11) 

.24 

(  .09) 

— 

.65 

(  .10) 

115.92 

GRAND  MEAN, 

RACE,  SEX, 

INTERACTION 

-.11  (.06) 

.90 

(.11) 

.13 

(.11  ) 

.42  (.21) 

.65 

(  .10) 

111.12 

UNCONSTRAINED  MEANS, 

UNCONSTRAINED  VARIANCES  (VARIANCES  =1.06,  .63,  .39,  .27)  100.57  5 


TABLE  4 


FITTED  MEANS 


EFFECTS 

IN  MODEL 

WHITE 

MALES 

WHITE 

FEMALES 

BLACK 

MALES 

BLACK 

FEMALES 

GRAND  MEAN 

CN 

o 

• 

.02 

.02 

.02 

GRAND  MEAN, 
SEX 

.16 

-.13 

.16 

-.13 

GRAND  MEAN, 
RACE 

.35 

.35 

-.57 

-.57 

GRAND  MEAN, 
RACE,  SEX 

.47 

.22 

-.44 

-.69 

GRAND  MEAN, 
RACE,  SEX, 
INTERACTION 

.51 

.16 

1 

• 

o 

CM 

in 

1 

UNCONSTRAINED 

UNCONSTRAINED 

VARIANCES 

MEANS, 

.49 

.17 

-.46 

-.37 
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